Tags: data science*

0 bookmark(s) - Sort by: Date ↓ / Title /

  1. The author discusses a shift in approach to clustering mixed data, advocating for starting with the simpler Gower distance metric before resorting to more complex embedding techniques like UMAP. They introduce 'Gower Express', an optimized and accelerated implementation of Gower.
  2. This article details a hands-on approach to modeling rare events in time series data using Python. It covers data exploration, defining extreme events, fitting distributions (GEV, Weibull, Gumbel), and evaluating model performance using metrics like log-likelihood, AIC, and BIC. The example uses weather data and provides code snippets for implementation.
  3. This article explores the impact of hyperparameters on random forests, both in terms of performance and visual representation. It compares the performance of a default random forest with tuned decision trees and examines the effects of various hyperparameters like `n_estimators`, `max_depth`, and `ccp_alpha` using visualizations of individual trees, predictions, and errors.
  4. Extracting structured information effectively and accurately from long unstructured text with LangExtract and LLMs. This article explores Google’s LangExtract framework and its open-source LLM, Gemma 3, demonstrating how to parse an insurance policy to surface details like exclusions.
  5. Learn how to connect several essential tools to develop a simple yet intuitive dashboard using Streamlit, Plotly, DuckDB, and Pandas to visualize data from a JSON file.
  6. This article explores alternatives to NotebookLM, a Google assistant for synthesizing information from documents. It details NousWise, ElevenLabs, NoteGPT, Notion, Evernote, and Obsidian, outlining their key features, limitations, and considerations for choosing the right tool.
  7. This article explores gamma spectroscopy using a Radiacode 103G detector and Python, detailing data collection, analysis, and experiments with various objects to identify radioactive elements.
  8. A deep dive into advanced evaluation for data scientists, discussing why accuracy is often misleading and exploring alternative metrics for classification and regression tasks like ROC-AUC, Log Loss, R², RMSLE, and Quantile Loss.
  9. Leveraging MCP for automating your daily routine. This article explores the Model Context Protocol (MCP) and demonstrates how to build a toolkit for analysts using it, including creating a local MCP server with useful tools and integrating it with AI tools like Claude Desktop.
  10. A guide to building a front-end data application using Taipy, comparing it to Streamlit and Gradio, and providing a step-by-step implementation of a sales performance dashboard.

Top of the page

First / Previous / Next / Last / Page 1 of 0 SemanticScuttle - klotz.me: tagged with "data science"

About - Propulsed by SemanticScuttle